Picture for Xiangyu Yue

Xiangyu Yue

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Add code
Jun 01, 2026
Viaarxiv icon

$τ_0$-WM: A Unified Video-Action World Model for Robotic Manipulation

Add code
May 31, 2026
Viaarxiv icon

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

Add code
May 20, 2026
Viaarxiv icon

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Add code
May 12, 2026
Viaarxiv icon

From Web to Pixels: Bringing Agentic Search into Visual Perception

Add code
May 12, 2026
Viaarxiv icon

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Add code
May 06, 2026
Viaarxiv icon

A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning

Add code
Apr 12, 2026
Viaarxiv icon

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Add code
Apr 02, 2026
Viaarxiv icon

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Add code
Mar 30, 2026
Viaarxiv icon

GIDE: Unlocking Diffusion LLMs for Precise Training-Free Image Editing

Add code
Mar 22, 2026
Viaarxiv icon